18 research outputs found

    Anomaly Detection on Social Data

    Get PDF
    The advent of online social media including Facebook, Twitter, Flickr and Youtube has drawn massive attention in recent years. These online platforms generate massive data capturing the behavior of multiple types of human actors as they interact with one another and with resources such as pictures, books and videos. Unfortunately, the openness of these platforms often leaves them highly susceptible to abuse by suspicious entities such as spammers. It therefore becomes increasingly important to automatically identify these suspicious entities and eliminate their threats. We call these suspicious entities anomalies in social data, as they often hold different agenda comparing to normal ones and manifest anomalous behaviors. In this dissertation, we are interested in two kinds of anomalous behaviors in social data, namely the unusual coalition among a collection of entities and the unusual conflicting opinions among entities. The two kinds of anomalous behaviors lead us to define two types of anomalies, namely, anomaly collections of the same entity type and anomalous nodes of different entity types in bipartite graphs. This dissertation introduces two anomaly collection definitions, namely, Extreme Rank Anomalous Collection (or ERAC) and Coherent Anomaly Collection (or CAC). An ERAC is a set of entities that cluster toward the top or bottom ranks, when all entities in the population are ranked on certain features. We propose a statistical model to quantify the anomalousness of an ERAC, and present the exact as well as heuristic algorithms for finding top-K ERACs. We then propose the follow-up problem of expanding top-K ERACs to anomalous supersets. We apply the algorithms for ERAC detection and expansion on both synthetic and real-life datasets, including a web spam, an IMDB and a Chinese online forum dataset. Results show that our algorithms achieve higher precisions compared to existing spam and anomaly detection methods. CAC is defined based on ERAC, emphasizing the coherence among members of an ERAC. As top-K ERACs are often overlapping with each other, for applications where disjoint anomaly collections are of interest, we propose to find top-K disjoint CACs with exact and heuristic algorithms. Experiments on both synthetic and real-life datasets, including a Twitter, a web spam, and a Chinese online forum dataset show that our approach discovers not only injected anomaly collections in synthetic datasets but also real-life coherent collections of hashtag spammer, web spammers and opinion spammers which are hard to detect by clustering-based methods. We detect the second type of anomalies in a bipartite graph, where nodes in one partite represent human actors, nodes in the other partite represent resources, and edges carry the agreeing and disagreeing opinions from human actors to resources. The anomalousness of nodes in one partite depends on that of their connected nodes in the other partite. Previous studies have shown that this mutual dependency can be positive or negative. We integrate both mutual dependency principles to model the anomalous behavior of nodes. We formulate our principles and design an iterative algorithm to simultaneously compute the anomaly scores of nodes in both partites. Our method is applied on synthetic graphs and the results show that our algorithm outperforms existing ones with only positive or negative mutual dependency principles. Results on two real-life datasets, namely Goodreads and Buzzcity, show that our method is able to detect suspected spammed books in Goodreads and fraudulent publishers in mobile advertising networks with higher precision than existing approaches

    Mining coherent anomaly collections on web data

    Get PDF

    Visual analytics for supporting entity relationship discovery on text data

    Get PDF
    A*STAR Public Sector R

    Failure Analysis and Experimental Research on the Structure of Key Parts of Safety Valves

    No full text
    In order to design a safety valve with a high safety factor, on the basis of theoretical calculations and ANSYS analysis, stress analysis is performed on the key components of the new nuclear A48 safety valve under open and closed conditions to detect whether the safety valve fails. The natural frequency of the safety valve and the spring are respectively analyzed and tested for seismic performance. The spring parameters are calculated through spring design specifications to determine the method of spring failure calculation and analysis. A lateral deflection measurement bench was built to verify the correctness of the design method through experimental data and provide a reliable theoretical basis for the design of safety valves with high safety factors
    corecore